Korean Compound Noun Decomposition Using Syllabic Information Only
نویسندگان
چکیده
The compound nouns are freely composed in Korean, since it is possible to concatenate independent nouns without a postposition. Therefore, the systems that handle compound nouns such as machine translation and information retrieval have to decompose them into single nouns for the further correct analysis of texts. This paper proposes the GECORAM (GEneralized COmbination of Rule-based learning And Memory-based learning) algorithm for Korean compound noun decomposition using only syllabic information. The merit of rule-based learning algorithms is high comprehensibility, but they shows low performance in many application tasks. To tackle this problem, GECORAM combines the rule-based learning and memory-based learning. According to the experimental results, GECORAM shows higher accuracy than rule-based learning or memory-based learning alone.
منابع مشابه
A Study of Query Optimization for Korean Compound Nouns
Compound noun is one of phenomena of Korean language that information retrieval model of English-speaking community center is difficult to deal as indexing word that show most frequently in Korean. Compound noun consists of noun more than one and form of various kinds. It had been thought as big problem of index processing and searches it. Specially, compound noun analysis is difficult and comp...
متن کاملKorean Compound Noun Term Analysis Based on a Chart Parsing Technique
Unlike compound noun terms in English and French, where words are separated by white space, Korean compound noun terms are not separated by white space. In addition, some compound noun terms in the real world result from a spacing error. Thus the analysis of compound noun terms is a difficult task in Korean NLP. Systems based on probabilistic and statistical information extracted from a corpus ...
متن کاملSegmentation of Compound Nouns using Composite Mutual Information
In Korean, a compound noun may be freely formed with or without spaces between simple nouns. The exible word formation rule of Korean raises a serious problem in processing compound nouns with computers, in particular, in searching a dictionary with the compound noun as a search key. This paper describes a corpus-based method for segmenting a compound noun into simple nouns. Segmentation is per...
متن کاملA Multi-phase Semi-supersense Tagging of Korean Unknown Nouns
Supersense tagging is a problem of finding a corresponding semantic super tag (eg. Phenomenon, Act) based on syntactic information and annotated corpora. However, we employ semantic information rather than syntactic one and annotated corpora, because Korean language has relatively flexible syntactic structure and is lack of annotated corpora. To construct the automatic sense tagging system for ...
متن کاملCompound Noun Segmentation Based on Lexical Data Extracted from Corpus
Compound noun analysis is one of the crucial problems in Korean language processing because a series of nouns in Korean may appear without white space in real texts, which makes it difficult to identify the morphological constituents. This paper presents an effective method of Korean compound noun segmen-tation based on lexical data extracted from corpus. The segmentation is done by two steps: ...
متن کامل